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Introduction 


Over  the  past  decade,  genetic  changes  associated  with  recurrent  chromosome  breakpoints 
have  been  discovered  in  human  malignancies,  predominantly  of  haemato logic  origin.  The 
characterization  of  these  alterations  has  demonstrated  that  these  changes  can  be  disease 
specific  and  can  have  functional  consequences  (e.g.  Philadelphia  chromosome  and  CML). 
The  rationale  underlying  this  proposal  is  that  functional  recurrent  breakpoint- 
associated  chromosomal  changes  occur  during  breast  cancer  progression  and  that 
their  discovery  and  characterization  may  lead  to  novel  diagnostic  and  therapeutic  tools 
for  breast  cancer  patients. 

By  combining  studies  in  cell  lines,  and  two  different  independent  sets  of  breast  tumors, 
we  can  identify  important  genomic  rearrangements  that  drive  breast  cancer  tumorigenesis. 
By  associating  these  breakpoints  with  known  clinical  parameters,  we  might  be  able  to 
predict  recurrence  or  metastatic  potential  and  thus  determine  better  treatment  strategies. 

By  studying  the  biological  consequence  of  such  an  aberrant  breakpoint  in  the  genome,  it 
might  be  feasible  to  discover  new  ways  of  targeting  these  specific  alterations  on  the 
protein  level  with  new  compounds.  Taken  together,  this  study  will  uniquely  determine  for 
the  first  time  recurrent  biological  relevant  genomic  changes  in  breast  cancers. 

Body 

Task  1:  Determine  the  recurrence  rate  of  the  breakpoints  in  breast  cancer  cell  lines. 

To  determine  the  recurrence  rate  of  the  genomic  breakpoints  in  breast  cancer  cell  lines,  I 
created  2  pools  with  breast  cancer  cell  line  DNA.  One  pool  contained  9  cell  lines  and  the 
other  7.  In  total  16  cell  lines  were  tested  for  the  presence  of  the  398  breakpoints.  Bands 
were  present  in  about  30%  of  the  PCR  products.  These  PCRs  were  performed  before  we 
analyzed  them  in  depth  in  MCF-7.  When  sequence  analyzing  the  breakpoints  in  MCF- 
7/BAC,  we  discovered  that  many  breaks  were  induced  during  the  creation  of  the  BAC 
library.  Other  breakpoints  were  eliminated  by  inability  to  validate  on  BAC  or  MCF-7  cell 
line  DNA,  presence  in  the  normal  population,  or  redundancy. 

We  now  have  157  genomic  breakpoints  in  MCF-7  cells  that  could  be  confirmed  by  PCR 

across  breakpoint  joins  as 
likely  somatic  mutations.  A 
total  of  79  genes  are  involved 
in  rearrangement  events, 
including  10  fusions  of 
coding  exons  from  different 
genes  and  77  other  aberrant 
breakpoints  involving  known 
or  predicted  genes.  Among 
the  breakpoints  that  involved 
genes,  we  first  focused  on 
those  10  gene  fusion 
predicted  to  lead  to  fusion 
transcripts  (see  Table  1). 

Tablel:  Gene  fusions  discovered  in  MCF-7  breast  cancer  cell  line 


Fusion  Genes  in  MCF-7  Cells 


ARFGEF2: 

SULF2 

Intra-Chr 

Inversion 

20  ql  3.13;  20q13.13 

Fusion  of  A  RFGEF2  exon  1  to  SULF2 
exons  3-21, 1.2Mb  inversion 

DEPDC1B : 
ELOVL7 

Intra-Chr 

Inversion 

5q12.1;5q12.1 

Fusion  of  DEPDC1B  exons  1-7  (out  of  11) 
with  ELOVL7  exons  8-9,  127Kb  inversion 

RAD51C  : A TXN7 

Inter-Chr 

Rearrangement 

3p14.1  ;17q22 

Fusion  of  RAD51C  N-terminus  exons  1-7 
(out  of  9)  with  ATXN7  exons  6-1  3 

SULF2  : 

PRICKLE2 

Inter-Chr 

Rearrangement 

3p1 4.1  ;20q13.13 

Fusion  o1SULF2exon  1  with  lastexon  of 

PRICKLE2 

NPEPPS  :  USP32 

Intra-Chr 

Inversion 

17q21.32;17q23.2 

Fusion  of  NPEPPS  exons  1-12  (out  of  23) 
with  USP32ex ons  2-34,  13Mb  inversion 

ASTN2 :  PTPRG 

Inter-Chr 

Rearrangement 

3p14.2;9q33.1 

Fusion  of  ASTN2  exons  1-10  (out  of  22) 
with  PTPRG  exons  3-30 

BCAS3 :  BCAS4 

Inter-Chr 

Rearrangement 

17q23.2;20q13.13 

BCAS4  exon  1  fused  to  BCAS3 exons  23- 
24,  Ruan  et  al.  (Genome  Res  17:828-838) 

BCAS3 :  RSBN1 

Inter-Chr 

Rearrangement 

1p13.2;17q23.2 

Fusion  olRSBNI  first  exon  with  BCAS3 
exons  6-24 

ASTN2  : 

TBC1D16 

Inter-Chr 

Rearrangement 

9q33.1  ;17q25.3 

Fusion  olASTN2  exons  1-15  with 

TBC1D16  exons  2-12 

BCAS4 : 
PRKCBP1 

Intra-Chr 

Inversion 

20  ql  3.12;  20q13.13 

Fusion  of  BCAS4  exon  1  with  PRKCBP1 
exons  5-22,  3.5Mb  inversion 
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BP  8:  ARFGEF2-SULF2  Fusion 

BP  35:  RAD51C-ATXN7  Fusion 
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Figure  1 :  Discovery  of  chimeric  mRNA  product  in  MCF-7.  RNA  was  isolated  from  MCF-7  cells, 
and  RT-PCRwas  preformed  for  control  regions  and  the  fusion.  These  data  clearly  show  the 
presence  of  the  wildtype  transcript  in  MCF-7  and  the  two  control  RNAs  (from  MCF-10A,  and 
normal  breast),  whilethe  fusion  transcript  is  only  present  in  MCF-7. 


For  a  gene  fusion  to  have  a  function  significance  it  needs  to  make  an  aberrant  protein  that 
will  have  an  alternate  function  then  the  wildtype  proteins.  The  first  step  in  identifying 
these  is  to  determine  if  these  gene  fusions  produce  a  chimeric  mRNA.  To  determine  if  the 
predicted  chimeric  mRNA  transcript  was  created  by  these  genomic  fusions,  I  performed 
gene-specific  RT-PCR  on  MCF-7  and  2  normal  controls.  Out  of  ten  DNA  fusions,  four 
showed  a  fusion  mRNA  transcript  in  MCF-7  specifically  by  RT-PCR  (Figure  1). 

Three  of  these  are  newly  identified 
(ARFGEF2/SULF2, 

DEPDC 1 B/ELOVL7, 

RAD5 1 C/ATXN7),  and  one  has 
been  previously  described 
(BCAS4/BCAS3). 

If  a  genomic  fusion  is  present  in 
other  breast  cancer  cell  lines  it  is 
not  very  likely  it  will  occur  exactly 
at  the  same  position.  Even  if  the 
break  occurs  several  kilobases  up 
or  down  stream  of  the  originally 
discovered  breakpoint  in  MCF-7,  it 
might  still  create  the  same  down 
stream  consequence  by  making  the  same  chimeric  mRNA.  Because  of  this,  I  tested  the 
presence  of  the  4  different  chimeric  mRNA  in  16  breast  cancer  cell  lines.  The  16  breast 
cancer  cell  lines  were  divided  into  4  pools  of  4  cell  lines.  Pool  1  and  2  showed  the 
presence  of  a  band  after  performing  RT-PCR  for  the  RAD51C/ATXN7  fusion  (Figure  2). 
Sequencing  of  the  PCR  product  confirmed  the  presence  of  the  fusion.  After 
deconvo luting  the  pools,  I  discovered  that  the  RAD51C/ATXN7  fusion  was  present  in  2 

other  breast  cancer  cell  lines,  T47D, 
and  MDA  MB361.  The  fusion  of 
RAD51C  and  ATXN7  most  likely 
results  in  loss  of  a  critical  C-terminal 
domain  of  RAD51C.  By  western  blot 
I  was  able  to  detect  a  shorter  band  in 
MCF-7  and  MDA  MB361.  This  was 
confirmed  by  performing  an 
immunoprecipitating  RAD51C  with 
a  specific  antibody,  and  probing  the 
western  blot  with  a  different 
RAD51C  antibody  (Figure  3). 


BCCL  Pool  1  2  3  4  MCF-7 


Figure  2:  Discovery  of  chimeric  mRNA 
product  of  the  RAD51 C/ATXN7  fusion 
in  breast  cancer  cel  I  lines.  RNA  was 
isolated  from  16  breast  cancer  cell 
lines,  and  4  pools  of  4  cell  lines  were 
made.  RT-PCR  was  performed  the 
fusion.  The  presence  of  the  fusion 
transcript  was  confirmed  by 
sequencing. 
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Figure  3:  Presence  of  a  truncated  form  of  RAD51 C 
in  breast  cancer  cell  lines.  A)  Immunoprecipitation 
was  performed  on  cell  lysates  with  a  mouse  anti- 
Rad51C  antibody.  Elutes  from  the  IP  were  run  on 
an  SDS-page  and  probed  with  an  rabbit  anti- 
Rad51C  antibody.  B)  Repeat  of  experiment  in  A, 
with  the  addition  of  a  negative  control  MCF10-A 


I  performed  preliminary  functional  studies  for  the  ARFGEF2/SULF2  fusion,  which  are 
reported  under  Task  4. 

Another  angle  we  are  pursuing  is  the  evolution  of  breakpoints.  To  gain  insight  into  the 
heterogeneity  of  genomic  breakpoints  in  breast  cancer  cell  lines,  and  to  also  narrow  down 
on  breakpoints  originating  from  the  ancestor,  I  studied  these  157  validated  breakpoints  in 
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Figure  4:  Examples  of  breakpoint  analysis  by  PCR.  Breakpoints  were  scored  on  the  presence, 
size  and  number  of  PCR  bands. 


seven  MCF-7  sub-lines  (Figure  4).  Thirty- 
one  breakpoints  from  the  original  157 
identified  in  MCF-7/BAC,  are  present  in 
all  MCF-7  sub-lines.  When  looking  at  the 
distribution  of  these  breakpoints  in  the 
genome,  we  can  see  that  there  are  clusters 
of  breakpoints  on  certain  chromosomes, 
and  that  there  are  many  breakpoints 
randomly  distributed  throughout  the 
genome.  When  focusing  on  the  3 1 
breakpoints  that  are  common  in  all  the 
MCF-7  sub-lines,  it  is  clear  that  the 

clusters  are  retained  and  that  the  amount  of  breakpoint  randomly  distributed  is 
dramatically  reduced.  A  finding  of  interest  is  that  there  is  a  great  enrichment  of 
breakpoints  containing  genes  (50.3%  vs  77.4%,  p=0.0056)  (Figure  5).  Even  more 
interesting  is  that  5  of  the  10  fusions  are  in  all  cell  lines  (6.4%  vs  16.1%)  (Figure  5).  Also, 

all  4  fusion  genes  expressing  chimeric 
mRNA  product  are  present  in  all  MCF-7 
sub-lines,  indicating  that  these 
breakpoints  might  originate  from  an 
ancestral  cell  line.  With  this  information 
we  may  get  a  better  understanding  of  the 
evolution  and  heterogeneity  of  genomic 
instability  and  rearrangements  in  breast 
cancer.  Also,  by  narrowing  down  on 
breakpoints  that  are  in  all  the  MCF-7 
sublines,  I  get  closer  to  the  ’true’ 
breakpoints  that  originated  in  the  tumor 
of  which  MCF-7  is  derived. 

These  3 1  breakpoints  will  be  tested  on  other  breast  cancer  cell  lines  to  test  recurrence  rate. 
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Figure  5:  Graphic  representation  of  the  distribution  of  the  break  points.  The  enrichment  of 
gene-containing  breakpoints  is  statistically  significant  (p=0. 0056,  Fisher  exact) 


Task  2:  Determine  recurrence  rate  of  the  breakpoints  in  a  panel  of  breast  tumors. 

Only  until  recent  I  was  able  to  narrow  down  on  possibly  ‘true’  breakpoints  originating 
from  the  ancestor  of  the  MCF-7  breast  cancer  cell  line.  These  will  be  tested  on  other 
breast  cancer  cell  lines,  and  then  on  a  panel  of  breast  tumors. 

I  am  now  in  the  process  of  optimization  of  the  DNA  extraction  from  tumors.  To  be  able 
to  perform  long  range  PCR  of  about  lOkb  (as  proposed  in  the  grant),  the  DNA  needs  to 
be  more  then  lOkb  long  and  of  good  quality.  I  am  now  comparing  extraction  protocols  to 
determine  the  best  technique  to  get  the  DNA  required  for  the  long  range  PCR. 


Task  3:  Validate  breakpoints  in  an  independent  set  of  breast  cancer  tumors  and 
associate  breakpoints  with  histo-pathological  and  clinical  characteristics. 
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a.  Develop  break-away  FISH  probes  for  the  detection  of  recurrent  breakpoints. 

FISH  probes  were  developed  for  the  RAD51C/ATXN7  fusion.  After  confirming  probe 
was  specificity  on  control  lymphocytes,  metaphase  spreads  of  MCF-7  and  MCF-10A 


Figure  6:  Detection  RAD51C/ATXN7  fusion  by  FISH.  A)  Proximal  and  distal  probes  were  tested  on 
metaphase  spreads  of  MCF-7  cells.  MCF-7  shows  clear  amplification  of  RAD51C.  B)  Break-away 
FISH  on  metaphases  of  MCF-7  and  MCF-10A  cells. 


(negative  control)  cells,  were  hybridized 
with  probes  proximal,  and  distal  of  the 
break  in  RAD5 1C,  and  a  probe  for 
RAD51C  spanning  the  break  (Figure  6). 
These  break-away  FISH  experiments  did 
not  give  a  conclusive  answer,  thus  we 
decided  to  test  for  the  presence  of  the 
fusion.  Probes  for  RAD51C  and  ATXN7 
were  developed,  and  hybridized  on 
MCF-7  and  MCF-10A  metaphase 
spreads.  These  results  clearly  show  the 
presence  of  RAD51C  and  ATXN7  signal 
in  close  proximity  in  the  MCF-7  cells 
and  not  in  the  MCF-10A  cells  (Figure  7). 
With  these  data  we  were  able  to  generate  a  detection  tool  for  the  presence  of  the 
RAD51C/ATXN7  fusion,  and  to  confirm  the  presence  of  the  genomic  translocation  in 
MCF-7  cells. 


ATXN7  &  RAD51C 

Figure  7:  Detection  RAD51C/ATXN7  fusion  by  FISH.  Metaphases  of  MCF-7  and  MCF10A  cells 
were  hybridized  with  a  RAD51C  probe  and  ATXN7  probe.  Yellow  signal  in  MCF-7  shows  co¬ 
localization  of  DA51C  and  ATXN7,  indicating  a  fusion  between  the  genes. 
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b.  Detect  recurrent  breaks  with  break-away  FISH  and  associate  with  histo-pathological 
and  clinical  characteristics. 

FISH  was  performed  on  T47D,  and  MDA  MB361,  but  this  needs  to  be  repeated  due  to 
low  signal  intensity. 


Task  4:  Study  the  biological  significance  of  the  breakpoints  using  in  vitro  models. 

a.  Determine  the  downstream  consequence  based  on  the  position  of  the  aberrant  joint. 

Based  on  protein  sequence  analysis  and  protein  translation  programs,  I  was  able  to 
predict  the  fusion  protein,  and  speculate  on  the  consequence  of  the  ARFGEF2/SULF2, 
and  RAD51C/ATXN7.  By  the  creation  of  the  ARFGEF2/SULF2  fusion,  the  Sulfatase  2 
(SULF2)  protein  loses  its  targeting  peptide  for  targeting  for  secretion,  while  the  added 
sequence  of  ARFGEF2  does  not  add  any  functional  domain.  This  might  mean  that  the 
fusion  creates  a  non- functional  Sulfatase  2. 

By  using  protein  translation  programs,  it  became  clear  that  with  the  creation  of  the  fusion 
between  RAD51C  and  ATXN7  a  frameshift  in  the  codon  is  induced.  This  translates  into 
the  introduction  of  a  stop-codon  early  in  the  ATXN7  sequence.  This  most  likely  results  in 
the  loss  of  a  critical  C-terminal  domain  of  RAD51C,  without  the  addition  of  any 
significant  sequence  of  Ataxin  7.  Preliminary  data  confirming  this  truncation  is  show  in 
Task  1. 

b.  Recreate  join  with  cloning  techniques. 


Figure  8:  Test  expression  of  fusion  constructs.  293  cells  were  transfected  with  either  control 
plasmid  (not  shown),  ARFGEF2/SULF2,  or  DEPDC1B/ELOVL7  constructs.  Cells  were  fixed, 
stained  with  an  anti-V5  antibody,  and  imaged  by  fluorescence  microscopy. 


I  have  cloned 
ARFGEF2/SULF2, 

DEPDC 1 B/EFO VF7  and 
RAD5 1 C/ATXN7  fusions  into 
mammalian  expression  vectors 
by  performing  RT-PCR  on 
MCF-7  cells.  The  expression  of 
the  ARFGEF2/SULF2,  and  the 
DEPDC  1B/ELOVL7  vectors 
have  been  tested  by 
transfecting  293  cells.  The  cells 
were  then  stained  with  an  anti- 
V5  antibody,  and  analyzed  by 
fluorescence  (Figure  8). 


c.  Perform  targeted  experiments  to  determine  functional  consequence  of  the  aberrant 
join. 
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of  the 

ARFGEF2/SUL 
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cells.  All  three 
cell  lines  treated 
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siRNA  used  in  a 
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assay,  exhibited  an  advantage  over  the  cells  treated  with  control  siRNA.  Also,  cells 
treated  with  SULF2  siRNA  showed  an  enhanced  survival.  Cells  with  reduced  SULF2  die 
less,  and  recover  faster  in  serum  free  conditions  than  control  cells.  Knock-down  of 
SULF2  mRNA  also  gave  a  clear  advantage  in  anchorage-independent  growth  capability. 
This  shows  that  knocking-down  SULF2  enhances  the  tumorigenic  properties  in  multiple 
breast  cell  lines,  and  that  SULF2  might  act  as  a  tumor-suppressor  in  breast  cancer 
development.  The  presence  of  this  ARFGEF2/SULF2  fusion  might  mean  a  loss  of 
function  of  the  wildtype  tumor  suppressor  Sulfatase  2  and  enhance  the  tumorigenicity  of 
MCF-7  cells. 


°  Timeindavs  4  '  CmMSRNA  SULf  2siRNA 

Figure  9:  A)  Cells  treated  with  SULF2  siRNA  have  an  enhanced  proliferation  then  cells  treated  with  control 
siRNA.  B)  Cells  treated  with  SULF2 siRNA  have  an  enhanced  survival  compared  to  cells  treated  with  control 
siRNA.  C)  T reatment  of  MCF-7B  and  MDA  MB231  cells  with  siRNA  for  SULF2  increases  the  anchorage- 
independent  growth  capabilities. 


Key  research  Accomplishments 

•  Discovered  157  joins  in  MCF-7  cell  line,  of  which  only  few  have  been  previously 
described. 

•  10  gene  fusion  were  discovered,  of  which  4  express  a  chimeric  mRNA  (3  new 
ARFGEF2/SULF2,  1  previously  described) 

•  3 1  of  the  157  are  present  in  all  7  MCF-7  sublines  tested.  This  allows  us  to  narrow 
down  on  ‘true’  breakpoints  present  in  the  ancestor  of  the  MCF-7  cell  lines. 

.  Confirmed  RAD5 1C/ATXN7  fusion  by  FISH  in  MCF-7  cell  line. 

.  Cloned  3  fusion  transcripts  (ARF GEF2/SULF2,  DEPDC 1 B/ELOVL7, 

RAD51C/ATXN7)  into  mammalian  expression  vectors  by  amplification  of  the 
fusion  transcript  by  RT-PCR. 

•  Discovery  of  RAD5 1C/ATXN7  fusion  transcript  in  two  other  breast  cancer  cell  lines 
(T47D,  and  MDA  MB361) 

•  Discovery  of  short  form  of  Rad51C  protein  in  MCF-7  and  MDA  MB361 

•  Sulfatase  2  acts  as  a  tumor  suppressor  in  breast  cancer  cell  lines,  and  might  be 
dysfunctional  after  generation  of  the  ARFGEF2/SULF2  fusion. 
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Training  accomplishments 


•  Presented  twice  at  the  Research  and  Development  workshop  of  the  Breast  Center. 

•  Attended  and  presented  orally  data  at  the  Breast  Center/Cancer  Center  retreat 
(November  2008) 

•  Attended  and  presented  a  poster  at  the  LINK  meeting  (February  2009) 

•  Attended  and  presented  a  poster  at  the  Breast  Center/Cancer  Center  retreat 
(September  2009) 

•  Attended  weekly  the  Research  and  Development  workshop  of  the  Breast  Center 

•  Attended  bi-monthly  the  Journal  Club  of  the  Breast  Center 

•  Contributed  to  the  generation  of  data,  writing  and  editing  of  the  manuscript 
published  in  Genome  Research 

•  Attended  the  course  ‘Translational  Breast  Cancer’ 

•  Supervised  several  graduate  and  summer  students. 


Reportable  outcomes 

•  Hampton  OA,  den  Hollander  P,  Miller  CA,  Delgado  DA,  Li  J,  Coarfa  C,  Harris  RA, 
Richards  S,  Scherer  SE,  Muzny  DM,  Gibbs  RA,  Lee  AV,  Milosavljevic  A:  A 
sequence-level  map  of  chromosomal  breakpoints  in  the  MCF-7  breast  cancer  cell 
line  yields  insights  into  the  evolution  of  a  cancer  genome.  Genome  Research. 

Feb;  19(2):  167-77  2009. 

•  Abstract  submission  for  the  Breast  Center  Retreat  November  2008  entitled: 

Discovery  of  functional  genomic  breakpoints  in  breast  cancer. 

•  Abstract  submission  for  the  Breast  Center  Retreat  September  2009  entitled: 

Evolution  of  genomic  diversity  in  the  breast  cancer  cell  line  MCF-7. 

•  Abstract  submission  the  San  Antonio  Breast  Cancer  Symposium  December  2009 
entitled:  Evolution  of  genomic  diversity  in  the  breast  cancer  cell  line  MCF-7 


Conclusion 

In  contrast  to  leukemias  and  lymphomas,  carcinomas  contain  more  complex 
chromosomal  rearrangements,  only  partially  detectable  using  classic  cytogenetic  methods. 
Thus,  our  knowledge  of  chromosomal  rearrangements  in  solid  tumors  is  very  limited,  and 
“gene  fusions”  defining  a  specific  type  of  solid  tumor  have  not  yet  been  characterized. 
This  lack  of  knowledge  has  supported  the  paradigm  that  chromosomal  rearrangements 
leading  to  gene  fusions  are  almost  exclusively  seen  in  haematologic  malignancies  and  are 
extremely  rare  (maybe  <1%)  in  solid  tumors. 

Here  we  set  out  to  discover  the  chromosomal  rearrangements  that  are  important  in  breast 
cancer.  The  data  presented  here  shows  that  there  are  indeed  breakpoints  that  have  a 
functional  significance  in  breast  cancer  cell  lines.  I  even  discovered  a  fusion  that  is 
present  in  two  other  breast  cancer  cell  lines  besides  MCF-7.  The  next  step  is  to  test  the 
presence  of  the  3 1  breakpoints  in  all  other  breast  cancer  cell  lines  to  find  recurrence.  I  am 
getting  closer  to  testing  breakpoint  in  breast  tumors.  I  also  discovered  that  I  might  need  to 
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change  the  break-away  FISH  technique,  and  instead  try  to  detect  the  presence  of  the 
fusion. 

The  data  shown  on  the  ARFGEF2/SULF2  and  RAD51C/ATXN7  fusions  indicate  that  we 
discovered  novel  strategy  of  the  tumor  cells  to  silence  important  tumor  suppressors.  The 
work  performed  in  the  coming  year  will  extremely  valuable  in  answering  the  very 
important  question  which  chromosomal  alterations  are  important  for  breast  cancer 
genesis. 
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By  applying  a  method  that  combines  end-sequence  profiling  and  massively  parallel  sequencing,  we  obtained  a  sequence- 
level  map  of  chromosomal  aberrations  in  the  genome  of  the  MCF-7  breast  cancer  cell  line.  A  total  of  157  distinct  somatic 
breakpoints  of  two  distinct  types,  dispersed  and  clustered,  were  identified.  A  total  of  89  breakpoints  are  evenly  dispersed 
across  the  genome.  A  majority  of  dispersed  breakpoints  are  in  regions  of  low  copy  repeats  (LCRs),  indicating  a  possible 
role  for  LCRs  in  chromosome  breakage.  The  remaining  68  breakpoints  form  four  distinct  clusters  of  closely  spaced 
breakpoints  that  coincide  with  the  four  highly  amplified  regions  in  MCF-7  detected  by  array  CGH  located  in  the  lp!5.1- 
p21.1,  5pl4.1-pl4.2, 17q22-q24.5,  and  20ql2-ql5.55  chromosomal  cytobands.  The  clustered  breakpoints  are  not  signifi¬ 
cantly  associated  with  LCRs.  Sequences  flanking  most  (95%)  breakpoint  junctions  are  consistent  with  double-stranded 
DNA  break  repair  by  nonhomologous  end-joining  or  template  switching.  A  total  of  79  known  or  predicted  genes  are 
involved  in  rearrangement  events,  including  10  fusions  of  coding  exons  from  different  genes  and  77  other  rearrangements. 
Four  fusions  result  in  novel  expressed  chimeric  mRNA  transcripts.  One  of  the  four  expressed  fusion  products  [RAD51C- 
ATXN7]  and  one  gene  truncation  [BRIPI  or  BACHt)  involve  genes  coding  for  members  of  protein  complexes  responsible  for 
homology-driven  repair  of  double-stranded  DNA  breaks.  Another  one  of  the  four  expressed  fusion  products  ( ARFGEF2 - 
SULF2)  involves  SULF2,  a  regulator  of  cell  growth  and  angiogenesis.  We  show  that  knock-down  of  SULF2  in  cell  lines  causes 
tumorigenic  phenotypes,  including  increased  proliferation,  enhanced  survival,  and  increased  anchorage-independent 
growth. 

[Supplemental  material  is  available  online  at  www.genome.org  and  through  the  Breast  Cancer  project  page  at 
www.genboree.org.  All  MCF-7  BAC  clones  are  available  from  Amplicon  Express  under  name  HTA  and  plate/ row/ 
column  names  as  indicated.  The  sequence  data  from  this  study  have  been  submitted  to  the  NCBI  Trace  and  Short  Read 
Archives  (http://www.ncbi.nlm.nih.gov)  under  accession  nos.  2172854909-2172901416  and  2172904852-2172911164,  and 
SRR006762-SRR006767,  respectively]. 


Many  cancer  genomes  are  characterized  by  mutability  including 
microsatellite  instability  (MIN)  and  chromosomal  instability 
(CIN)  (Lengauer  et  al.  1998).  It  is  now  generally  anticipated  that 
sequencing  of  cancer  genomes  using  massively  parallel  sequenc¬ 
ing  technologies  (Korbel  et  al.  2007;  Campbell  et  al.  2008)  will 
provide  insights  into  structural  mutability.  Recent  sequencing  of 
four  cancer  amplicons  (Bignell  et  al.  2007)  derived  from  the 
HCC1954  breast  cancer  cell  line  and  two  lung  cancer  cell  lines 
provided  evidence  for  homologous  and  nonhomologous  repair  of 
double-strand  DNA  breaks  induced  by  the  breakage-fusion-bridge 
(BFB)  mechanism. 
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Gene  fusions  and  truncations  that  result  from  chromosomal 
rearrangements  provide  insight  into  the  molecular  mechanisms  of 
cancer  progression.  Recurrent  rearrangements  of  specific  genes  in¬ 
dicate  increased  mutability  or  positive  selection  (or  a  combination 
of  both)  in  the  evolution  of  tumor  genomes.  Recurrent  fusions, 
translocations,  and  other  aberrant  joins  are  used  as  highly  in¬ 
formative  diagnostic  and  prognostic  markers  and  drug  targets  in 
leukemias,  lymphomas,  and  sarcomas.  A  total  of  337  genes  involved 
in  fusions  in  cancer  genomes  have  been  recently  surveyed  (Mitel- 
man  et  al.  2007).  Four  gene  fusions  have  previously  been  reported 
in  breast  carcinomas  ( ETV6-NTRK3 ,  ODZ4-NRG1 ,  TBL1XR1- 
RGS17,  BCAS3-BCAS4 )  (Mitelman  et  al.  2007,  Ruan  et  al.  2007). 

Breast  cancer  and  carcinomas  in  general  have  proven  less 
tractable  to  fusion  discovery  due  to  the  typically  higher  degree  of 
rearrangement.  However,  a  prognostically  significant  rearrange¬ 
ment  was  recently  discovered  in  the  majority  of  prostate  cancers 
(Tomlins  et  al.  2005).  Of  note,  the  initial  discovery  was  not  iden- 
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tified  by  analyzing  DNA  sequence  or  structure,  but  via  the  analysis 
of  outlier  gene  expression,  followed  by  a  targeted  locus-specific 
search  for  a  fusion  in  genomic  DNA.  Here  we  demonstrate 
a  method  to  detect  gene  fusions  directly  by  the  analysis  of  geno¬ 
mic  DNA,  even  in  highly  rearranged  breast  cancer. 

MCF-7  is  the  most  widely  used  cell  line  model  for  estrogen¬ 
positive  breast  cancer.  The  cell  line  has  been  derived  from  a  pleural 
effusion  taken  from  a  patient  with  metastatic  breast  carcinoma 
(Soule  et  al.  1973).  Evidence  of  CIN  in  MCF-7  comes  from  appar¬ 
ent  aneuploidy  and  significant  genomic  divergence  in  several 
sublines  (Jones  et  al.  2000;  Nugoli  et  al.  2003).  Chromosomal 
aberrations  in  MCF-7  have  previously  been  studied  by  spectral 
karyotyping  (Kytola  et  al.  2000;  Rummukainen  et  al.  2001), 
comparative  genomic  hybridization  (CGH)  (Kytola  et  al.  2000; 
Rummukainen  et  al.  2001),  array  CGH  (Neve  et  al.  2006;  Shadeo 
and  Lam  2006;  Jonsson  et  al.  2007),  single  nucleotide  poly¬ 
morphism  arrays  (Huang  et  al.  2004),  and  gene  expression  arrays 
(Neve  et  al.  2006). 

More  recently,  bacterial  artificial  chromosome  (B AC) -based 
end  sequence  profiling  (ESP)  (Volik  et  al.  2003,  2006;  Raphael  et  al. 
2008)  has  been  applied  to  study  genomic  rearrangements  in  can¬ 
cer  genomes.  Volik  and  colleagues  sequenced  a  total  of  19,831 
BAC  ends  from  the  Amplicon  Express  MCF-7  BAC  library,  ~lx 
clone  coverage  of  the  human  genome,  to  identify  582  BACs  con¬ 
taining  rearrangements. 

As  a  starting  point  for  our  analysis,  we  constructed  BAC  pools 
from  a  nonredundant  subset  ( n  =  552)  of  rearranged  BACs  iden¬ 
tified  by  Volik  et  al.  (2003,  2006).  To  map  chromosomal  aberrations 
in  the  genome  of  the  MCF-7  breast  cancer  cell  line  at  sequence 
level  resolution,  we  developed  a  method  that  combines  end- 
sequence  profiling  and  massively  parallel  sequencing.  By  analyzing 
sequences  of  the  chromosomal  breakpoints  in  the  BAC  pools, 
we  gained  insights  into  the  mechanisms  of  chromosomal  insta¬ 
bility  and  repair.  Specific  gene  fusions  and  truncations  that  have 
emerged  during  the  pathological  evolution  of  this  cancer  genome 
point  to  the  molecular  mechanisms  of  the  disease.  Additional 
products  of  our  research  are  benchmarking  reagents  for  the  de¬ 
velopment  of  a  new  generation  of  methods  for  detecting  structural 
genome  variation,  including  well-characterized  BAC  pools  and 
validated  breakpoints  in  the  MCF-7  genome. 

Results 

At  least  157  breakpoints  were  induced  by  somatic 
rearrangements  in  MCF-7 

Aberrant  breakpoint-induced  joins  were  identified  by  combining 
"bridging"  and  "outlining"  steps,  as  illustrated  in  Figure  1A.  The 
bridging  step  utilizes  end-sequence  information  from  fosmid-sized 
clone  inserts  to  connect  chromosomal  loci  brought  together  at 
aberrant  rearrangement-induced  joins  in  the  cancer  genome.  End- 
sequences  of  breakpoint-spanning  fosmids  were  recognized  as 
those  that  do  not  map  onto  the  reference  genome  in  a  manner 
consistent  with  the  clone  insert  size  or  end-sequence  orientation. 
The  outlining  step  involves  a  precise  localization  of  breakpoint 
sites  by  mapping  short  tags  generated  by  the  454  Life  Sciences 
(Roche)  pyrosequencing  machine  onto  the  reference  genome. 

As  illustrated  in  Figure  IB,  three  pools,  each  containing  192 
BACs  containing  putative  rearrangements,  were  constructed  for 
the  purpose  of  massively  parallel  sequencing  using  the  454  GS 
sequencing  machine.  Approximately  300,000  short  (~  100-bp) 
reads  were  sequenced  from  each  pool,  providing  ~lx  sequence 


coverage  for  the  purpose  of  outlining.  Six  96-BAC  pools  were 
formed  from  the  same  set  of  BACs  for  the  purpose  of  fosmid  library 
preparation,  end-sequencing  and  bridging.  Approximately  8000 
to  10,000  fosmid  inserts  from  each  of  the  six  pools  were  end- 
sequenced,  providing  24 X  clone  coverage  and  ~lx  sequence 
coverage  for  the  purpose  of  bridging. 

Upon  sequencing,  the  fosmid  end-reads  and  the  454  reads  to¬ 
gether  with  the  BAC  end-sequences  produced  by  Volik  et  al.  (2003, 
2006)  were  mapped  onto  the  reference  human  genome.  In¬ 
dependent  aberrant  mapping  of  two  fosmids  across  a  specific  puta¬ 
tive  breakpoint  was  considered  to  constitute  sufficient  evidence  to 
declare  the  breakpoint.  BAC  or  fosmid  ends  that  map  onto  different 
chromosomes  are  interpreted  as  interchromosomal  breakpoints. 
The  outlined  regions  were  bridged  using  end-sequences  from  BACs 
and  fosmids.  The  combination  of  outlining  and  bridging  enabled 
identification  of  breakpoint  locations  down  to  a  PCR-able  distance. 
As  indicated  in  Figure  1C,  out  of  the  total  of  410  detected  break¬ 
points,  157  could  be  confirmed  by  PCR  across  breakpoint  joins  as 
likely  distinct  somatic  mutations.  As  indicated  by  the  bars  in  the 
middle  of  Figure  1C,  the  remaining  breakpoints  failed  the  confir¬ 
mation  process  for  a  number  of  different  reasons,  as  we  explain  next. 

A  total  of  47  breakpoints  could  not  be  unambiguously  resolved 
down  to  a  PCR-able  distance  using  the  outlining  method.  PCR 
primers  were  designed  for  the  remaining  breakpoints  using  a  semi- 
automated  primer  design  pipeline.  When  applied  to  pooled  BACs, 
PCR  primers  failed  to  generate  amplicons  in  expected  size  range  for 
23  predicted  breakpoint  joins.  Further  confirmation  included  am¬ 
plification  of  a  pool  of  genomic  DNA  from  six  MCF-7  cell  lines  (B, 
BK,  C,  D,  L,  and  Neo).  DNA  isolated  from  MCF-10A  and  normal 
human  female  DNA  (Novagen)  were  used  as  negative  controls.  A 
total  of  123  PCR  primer  pairs  that  produced  amplicons  from  the 
BAC  pool  did  not  produce  amplicons  from  the  genomic  DNA  de¬ 
rived  from  cell  pools.  A  majority  of  these  breakpoint  sites  contained 
Hindlll  restriction  sites.  Since  the  BAC  library  was  prepared  using 
Hindlll  partial-digestion  of  genomic  DNA,  those  breakpoints  were 
most  likely  created  by  fusion  of  digestion  products  in  the  course  of 
BAC  library  preparation.  Other  sources  of  this  discrepancy  may 
include  a  number  of  cell  line-specific  aberrations  generated  over 
a  number  of  passages  that  preceded  preparation  of  the  BAC  library. 

To  identify  structural  polymorphic  variants  present  in  the 
germline  of  the  MCF-7  donor,  PCR  amplification  of  breakpoint 
joins  was  performed  on  a  pool  of  90  Caucasian  HapMap  genomes 
(International  HapMap  Consortium  2005).  Additionally,  search 
for  occurrences  of  the  apparently  somatic  joins  was  performed  in 
publicly  available  genomic  sequences  using  the  Pash  program 
(Kalafus  et  al.  2004).  A  total  of  40  apparently  aberrant  joins  were 
present  in  the  HapMap  samples,  as  indicated  by  the  presence  of 
a  PCR  product,  and  thus  correspond  to  structural  alleles  different 
from  the  structural  alleles  represented  in  the  reference  genome 
assembly.  Finally,  some  breakpoints  were  identified  to  occur  in 
more  than  one  BAC,  and  the  count  was  reduced  by  20  to  elimi¬ 
nate  multiple  counting,  resulting  in  a  total  of  157  unique  con¬ 
firmed  somatic  breakpoint  joins  in  the  MCF-7  genome.  Of  the  157 
MCF-7  somatic  breast  cancer  breakpoints,  74  (47%)  formed  in¬ 
terchromosomal  and  83  (53%)  intrachromosomal  joins,  as  illus¬ 
trated  in  Figure  2,  A  and  B. 

A  majority  of  the  somatic  breakpoints  could  be  assigned 
to  specific  BACs 

If  a  chromosomal  segment  outlined  by  454  reads  connected  a  BAC 
end-sequence  and  a  breakpoint-spanning  fosmid  end-sequence, 
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Figure  1.  ( A )  An  illustration  of  the  principle  of  the  method.  Breakpoints  within  a  BAC  containing  segments  from  chromosomes  20,  3,  and  17  are 
detected  using  a  combination  of  "bridging"  and  "outlining"  steps.  The  bridging  step  maps  fosmid  end-sequences  onto  the  reference  genome.  The 
outlining  step  maps  short  tags  (labeled  "PyroSeqs")  using  454  technology  from  the  BAC  (in  practice  a  pool  of  BACs)  onto  the  reference  genome.  The 
results  of  bridging  and  outlining  jointly  allow  precise  mapping  of  breakpoints  and  reconstruction  of  rearranged  BACs.  (B)  Organization  of  the  mapping 
experiment.  The  nonredundant  collection  of  552  rearrangement  containing  BACs,  1  7  normal  BAC  negative  controls,  and  seven  positive  controls  was 
arrayed  in  six  96-well  plates  and  pooled  as  indicated.  Three  454  sequencing  reactions  (involving  BACs  pooled  from  plate  pairs)  produced  tags  for  the 
purpose  of  outlining.  Six  fosmid  libraries  (one  from  each  96-well  plate  pool  of  BACs)  were  constructed  for  Sanger-based  sequencing  of  fosmid  ends  and 
bridging.  (Q  Bar  charts  detailing  the  classification  of  detected  MCF-7  breakpoints. 


the  breakpoint  could  be  associated  with  the  BAC.  Out  of  552 
pooled  BACs,  at  least  one  breakpoint  could  be  assigned  to  316 
(57%)  of  them.  The  remaining  BACs  fall  into  the  following  two 
groups:  First,  in  129  (23%)  cases,  breakpoint  assignment  was  in¬ 
conclusive  due  to  ambiguous  mapping  of  reads  onto  the  reference 
genome,  mostly  due  to  repetitive  DNA  regions,  apparent  overlaps 
between  BACs,  and  other  causes;  second,  in  107  (20%)  cases, 
a  single  outlining  block  connected  BAC  ends,  thus  indicating  lack 
of  any  rearrangement,  contrary  to  previous  reports  (Volik  et  al. 
2003,  2006). 

To  examine  the  source  of  the  disagreement  with  the  previous 
reports,  the  107  disagreements  were  examined  in  detail.  Most  of  the 
disagreements  could  be  explained  either  by  the  differences  between 
reference  genome  assemblies  used  in  the  previous  and  current 
studies  or  by  mismapping  of  BAC-end  sequence  reads  or  by  a  com¬ 
bination  of  the  two  factors.  Assemblies  used  in  the  previous  studies 
were  NCBI  Build  30  of  June  2002  (Volik  et  al.  2003)  and  NCBI  Build 
34  of  July  2003  (Volik  et  al.  2006),  while  our  study  employed  NCBI 
Build  36  of  March  2006.  The  newer  assembly  is  more  likely  to  be 
more  correct  and  complete,  but  some  of  the  disagreements  may  also 
be  explained  by  the  presence  of  different  stmctural  alleles  at  sites  of 
structural  polymorphisms.  The  disagreements  tended  to  occur  in 
regions  containing  low  copy  repeats  (LCRs).  For  example,  Volik 
et  al.  (2003)  identified  MCF-7  BAC  9110  as  bridging  apparent 
translocation  t(ll;ll)(pll.l2;ql4.3)  and  apparently  confirmed  the 


rearrangement  by  fluorescent  in  situ  hybridization  (FISH).  Exami¬ 
nation  of  Build  36  reveals  copies  of  an  LCR  at  both  lip  11.12  and 
llql4.3.  The  LCR  was  absent  from  Builds  30  and  34,  thus 
explaining  the  aberrant  BAC-end  sequence  mapping  and  even  the 
erroneous  "confirmation"  by  FISH. 

Examination  of  breakpoint  sequences  reveals  signatures 
of  DSB  repair 

To  examine  breakpoints  at  the  sequence  level,  all  the  157  break¬ 
point-spanning  amplicons  were  used  as  substrates  for  sequencing 
from  both  ends.  Most  amplicons  were  of  small  enough  size  (less 
than  1  kb  on  average),  allowing  the  Sanger  read  from  at  least  one 
of  the  ends  to  reach  the  breakpoint.  Difficultly  of  sequencing 
across  breakpoints  has  been  documented  (Lee  et  al.  2007;  Liu  and 
Carson  2007),  especially  in  repeat-rich  regions.  To  ameliorate  the 
problem,  we  sequenced  DNA  from  specific  BAC  pools  and 
employed  nested  sequencing  primers  in  cases  of  first-pass  se¬ 
quencing  failures.  Breakpoint-straddling  sequence  could  be 
obtained  from  86  (55%)  amplicons  and  could  not  be  obtained  for 
the  remaining  71  (45%).  Many  of  the  failures  were  due  to  inability 
to  design  unique  primers  for  sequencing  across  breakpoints  that 
fall  within  repeat-rich  regions. 

Examination  of  86  breakpoints  that  could  be  resolved  to  the 
base  pair  level  (summarized  in  the  chart  in  the  middle  of  Lig.  2B) 
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Figure  2.  ( A )  Circular  visualization  of  the  MCF-7  genome  obtained  using  Circos  software.  Chromosomes  are  individually  colored  with  centromeres  in 
white  and  LCR  regions  in  black.  MCF-7  BAC  array  comparative  genome  hybridization  data  (Jonsson  et  al.  2007)  are  plotted  with  gains  in  green  and  losses 
in  red  using  log2ratio.  The  inner  chromosome  annotations  depict  1 57  somatic  MCF-7  breast  tumor  chromosomal  rearrangements  associated  with  LCRs 
(black)  and  breakpoints  not  associated  with  LCRs  (green).  Chromosomal  rearrangements  are  depicted  on  each  side  of  the  MCF-7  breakpoints;  intra- 
chromosomal  rearrangements  (blue)  are  located  outside  and  interchromosomal  rearrangements  (red)  are  located  in  the  center  of  the  circle.  ( B )  Bar  charts 
indicating  classification  of  somatic  breakpoints  in  MCF-7. 
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revealed  14  flush  joins  without  evidence  of  microhomology  or 
intervening  sequence,  29  joins  with  intervening  inserts  of  un¬ 
known  genomic  origin  averaging  over  100  bp  in  length,  and  43 
joins  where  the  joined  segments  exhibit  homology.  The  extent  of 
homology  was  in  most  (88%)  cases  restricted  to  <7  bp,  consistent 
with  microhomology  observed  in  double-stranded  breaks  repaired 
by  nonhomologous  end- joining  (NHEJ)  or  template  switching 
(Sonoda  et  al.  2006).  Due  to  the  absence  of  straddling  sequence, 
the  remaining  71  breakpoints  could  only  be  analyzed  at  the  ~1- 
kbp  level  of  resolution. 

Out  of  the  86  somatic  breakpoints  isolated  to  base  pair  reso¬ 
lution,  only  four  (5%)  exhibited  sequence  patterns — sequence 
identity  and  equal  crossover  between  two  homologous  loci — 
consistent  with  nonallelic  homologous  recombination  (NAHR) 
(chart  on  the  right  of  Fig.  2B).  The  dominant  mechanism  re¬ 
sponsible  for  the  repair  of  double-strand  breaks  in  MCF-7  therefore 
appears  to  be  NHEJ  or  template  switching. 

Two  distinct  types  of  breakpoints  exist  in  MCF-7-cIustered 
and  LCR-associated 

As  evident  from  Figure  2,  the  breakpoints  in  MCF-7  are  not  evenly 
distributed  across  the  genome.  A  number  of  clusters  of  closely 
spaced  breakpoints  are  evident.  To  formally  delineate  the  clustered 
breakpoints  from  the  remainder,  clusters  of  eight  or  more  break¬ 
points  that  are  less  than  1.1  Mbp  apart  were  identified.  Four  such 
clusters  emerged  in  the  following  locations:  lpl3. 1-21.1,  3pl4.1- 
pl4.2,  17q22-q24.3,  and  20ql2-ql3.33.  These  four  rearrangement 
clusters,  illustrated  in  Figure  3 A,  contain  43%  of  all  MCF-7  somatic 
breakpoints,  while  representing  only  1.5%  of  the  normal  reference 
genome. 

The  remaining  nonclustered  or  dispersed  breakpoints  are 
highly  associated  with  FCRs,  showing  a  5.2-fold  enrichment  for 
the  presence  of  FCRs  at  the  breakpoint  site  (P-value  =  2.9  X  10-22; 
see  Fig.  3B).  This  is  in  contrast  to  the  clustered  breakpoints  that  do 
not  exhibit  enrichment  for  FCRs,  with  only  five  out  of  68  clustered 
breakpoints  being  FCR-associated,  well  within  the  number 
expected  by  chance.  Moreover,  as  illustrated  in  Figure  3C,  the  four 
clustered  breakpoint  locations  exactly  coincide  with  high  copy 
number  gain  regions  ("firestorms,"  the  term  proposed  by  Hicks 
et  al.  [2006])  in  the  MCF-7  genome  described  by  Jonsson  et  al.  (2007) 
and  contain  prognostic  gene  markers  for  breast  cancer. 

To  further  examine  possible  differences  between  the  clustered 
breakpoints  and  the  dispersed  ones,  we  identified  regions  that 
show  recurrent  copy  number  amplification  in  cancer  in  previous 
studies  involving  145  breast  tumors  and  56  breast  cancer  cell  lines 
(Chin  et  al.  2006;  Neve  et  al.  2006;  Shadeo  and  Earn  2006;  Jonsson 
et  al.  2007).  As  illustrated  in  Supplemental  Figure  5,  almost  three- 
fourths  of  breakpoints  occurring  in  the  four  clusters  are  highly 
recurrently  amplified  (high  recurrence  is  declared  if  at  least  20%  of 
the  surveyed  samples  show  amplification),  a  greater  than  twofold 
enrichment  over  other  (dispersed)  breakpoints.  Additionally,  the 
mean  number  of  amplifications  at  each  breakpoint  location  is 
significantly  higher  among  clustered  vs.  dispersed  breakpoints. 
These  data  suggest  that  genomic  instability  in  these  cluster  regions 
is  not  specific  to  MCF-7. 

Novel  chimeiric  transcripts  could  be  predicted  based 
on  fusions  of  genomic  DNA 

Among  the  breakpoint  fusions  that  involved  genes,  we  first  focused 
on  those  that  occurred  within  introns  and  are  predicted  to  lead  to 


chimeric  transcripts.  We  discovered  10  gene  fusions  (Table  1)  where 
fusion  breakpoints  reside  in  intronic  regions  of  the  genes  involved, 
implying  in-frame  translation  of  the  original  amino  acid  sequences. 

To  determine  if  the  predicted  chimeric  mRNA  transcript  was 
created  by  these  genomic  fusions,  we  performed  gene-specific  re¬ 
verse  transcriptase  reactions  and  a  fusion-specific  PCR  on  RNA 
extracted  from  MCF-7,  MCF-10A,  and  normal  breast  tissue  (the 
latter  two  serving  as  negative  controls).  Since  the  primers  were 
designed  to  amplify  the  fusion  product  specifically,  a  band  was 
only  generated  if  a  fusion  product  was  present  (for  primers  se¬ 
quence  see  Supplemental  Table  4).  Out  of  10  fusions,  four  showed 
a  fusion  mRNA  transcript  by  RT-PCR,  see  Figure  4. 

To  identify  if  other  sources  reported  the  same  fusion  tran¬ 
scripts  in  MCF-7,  other  cell  lines  or  primary  tumors,  we  queried  70 
MCF-7  and  HCT116  (colon  cancer)  paired-end  ditag  fusion  tran¬ 
script  sets  reported  by  Ruan  et  al.  (2007)  and  237  fusion  transcripts 
from  the  Cancer  Genome  Anatomy  Project  Recurrent  Chromo¬ 
some  Aberrations  in  Cancer  database  reported  by  Hahn  et  al. 
(2004).  Of  the  10  MCF-7  gene  fusions  identified  by  our  bridging 
and  outlining  method,  the  BCAS3-BCAS4  fusion  was  found  to  be 
previously  characterized  Ruan  et  al.  (2007)  Interestingly,  the 
BCAS3-BCAS4  fusion  is  recurrently  present  in  both  the  MCF-7 
breast  cancer  and  HCT116  colon  cancer  cell  lines. 

Some  of  the  fusions  and  truncations  may  suppress  function 
of  normal  gene  product 

Most  fusions  involve  highly  amplified  clustered  breakpoints,  in¬ 
dicating  possible  positive  selection  and  therefore  functional  sig¬ 
nificance.  This  is  consistent  with  the  fact  that  firestorm  patterns 
indicate  poor  prognosis  (Hicks  et  al.  2006)  and  that  these  highly 
amplified  regions  contain  specific  prognostic  markers  (Jonsson 
et  al.  2007).  However,  not  all  the  amplified  loci  contain  onco¬ 
genes.  Analysis  and  results  below  indicate  that  the  oncogenic 
effects  of  some  of  the  fusions  may  in  fact  be  due  to  a  suppression  of 
normal  function  of  a  tumor  suppressor  gene.  Observed  amplifi¬ 
cation  of  gene  fusions  involving  tumor  suppressors  is  consistent 
with  a  dominant-negative  effect  of  such  gene  fusions. 

For  example,  the  first  two  exons  of  PTPRG,  comprising 
the  carbonic  anhydrase-like  domain,  are  replaced  by  the  first 
10  exons  of  the  unannotated  inter-species  ASTN2  gene. 
Promoter  hypermethylation  in  PTPRG  in  T-cell  lymphoma  leads  to 
loss  of  gene  expression  and  correlates  with  poor  prognosis  (van 
Doom  et  al.  2005).  Interestingly,  Murine  F  cells  producing  PTPRG 
transcripts  with  a  homozygous  deletion  of  the  carbonic  anhy¬ 
drase-like  domain  causes  sarcomas  in  syngeneic  mice  (Wary  et  al. 
1993). 

To  examine  the  effects  of  a  possible  suppression  of  SULF2 
function  by  the  ARFGEF2-SULF2  fusion,  SULP2  mRNA  was  knocked 
down  using  siRNA  specifically  targeting  SULF2  in  MCF-7B,  MDA 
MB231,  and  MCF-10A  cells  (Supplemental  Fig.  6).  Proliferation 
assays  were  performed  on  the  three  cell  lines  treated  with  knocked 
down  SULF2,  and  all  exhibited  an  advantage  over  the  cells  treated 
with  control  siRNA  (Fig.  5A-C).  To  determine  the  effect  on  survival 
capabilities  under  stress  conditions,  SULF2  siRNA  and  control 
siRNA  treated  cells  were  plated  in  serum-free  conditions.  Results 
indicate  (Fig.  5D-F)  that  cells  with  knocked  down  SULF2  survive 
better,  and  recover  faster  (seen  by  the  steeper  slope)  in  serum-free 
conditions  then  the  control  cells.  This  implies  that  knock-down  of 
SULF2  enhances  survival  compared  to  the  control  cells.  Finally, 
knock-down  of  SULF2  mRNA  caused  a  twofold  increase  in  an¬ 
chorage-independent  growth  in  MCF-7B  and  a  threefold  increase 
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Figure  3.  ( A )  Four  clusters  of  breakpoints  at  1  pi  3.1  -21 .1 ,  3p1 4.1  -pi  4.2,  1  7q22-q24.3,  and  20q1 2-ql  3.33.  (8)  Low  copy  repeat  (LCR)  association  with 
clustered  and  dispersed  breakpoints.  (Q  The  four  clusters  of  breakpoints  correspond  exactly  to  the  four  highly  amplified  regions  in  MCF-7,  as  determined 
by  array  CGH. 


in  MDA  MB231,  as  measured  by  the  amount  of  colonies  compared 
with  controls  (Fig.  5H).  In  summary,  the  data  indicate  that  knock¬ 
down  of  SULF2  causes  tumorigenic  phenotypes,  including  in¬ 
creased  proliferation,  enhanced  survival,  and  increased  anchor¬ 
age-independent  growth.  SULF2  may  therefore  act  as  a  breast 
cancer  suppressor. 


Some  genes  are  involved  in  numerous  rearrangements 

In  addition  to  the  10  gene-gene  fusions,  a  total  of  77  genes  were 
otherwise  affected  by  the  157  breakpoints.  We  jointly  refer  to  those 
events  as  "truncations"  even  though  some,  in  fact,  involve  fusion  of 
an  upstream  promoter  with  a  protein  coding  gene.  PTPRG  and 
other  genes  were  affected  by  multiple  breakpoints,  including  both 


Table  1.  Gene  fusions  in  MCF-7  that  involve  splicing  of  intact  coding  exons 


Associated  genes 

Rearrangement  type 

Cytoband  translocation 

Comment 

ARFGEF2-SULF2 

Intrachromosomal  inversion 

20q1  3.1  3-20q1  3.1  3 

Fusion  of  ARFGEF2  exon  1  to  SULF2  exons  3-21  ; 

1 .2-Mb  inversion 

DEPDC1 B-ELOVL7 

Intrachromosomal  translocation 

5q1 2.1  -5q1 2.1 

Fusion  of  DEPDG1 B  N  terminus  exons  1-7  (out  of  1 1) 
with  ELOVL7  exons  8-9 

RAD51C-ATXN7 

Interchromosomal  rearrangement 

3p1 4.1  -1  7q22 

Fusion  of  RAD51 G  exons  1-7  (out  of  nine)  with 

ATXN7  exons  6-1  3 

SULF2-PRICKLE2 

Interchromosomal  rearrangement 

3p1 4.1  -20q1  3.1  3 

Fusion  of  SULF2  exon  1  with  last  exon  of  PRIGKLE2 

NPEPPS-USP32 

Intrachromosomal  inversion 

1  7q21 .32-1  7q23.2 

Fusion  of  NPEPPS  exons  1-1 2  (out  of  23)  with  USP32 
exons  2-4;  1  3-Mb  inversion 

ASTN2-PTPRG 

Interchromosomal  rearrangement 

3p14.2-9q33.1 

Fusion  of  ASTN2  exons  1-1 0  (out  of  22)  with  PTPRG 
exons  3-30 

BGAS3-BGAS4 

Interchromosomal  rearrangement 

1  7q23.2-20q1  3.1  3 

BGAS4  exon  1  fused  to  BGAS3  exons  23-24;  also  found 
by  Ruan  et  al.  (2007) 

BGAS3-RSBN1 

Interchromosomal  rearrangement 

1  pi  3.2-1  7q23.2 

Fusion  of  RSBN1  first  exon  with  BGAS3  exons  6-24 

ASTN2-TBG1D1 6 

Interchromosomal  rearrangement 

9q33.1  -1  7q25.3 

Fusion  of  ASTN2  exons  1  -1 5  with  TBG1 D1 6  exons  2-1 2 

BGAS4-PRKGBP1 

Intrachromosomal  inversion 

20q1  3.1 2-20q1  3.1  3 

Fusion  of  BGAS4  exon  1  with  PRKGBP1  exons  5-22; 
3.5-Mb  inversion 
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Figure  4.  Confirmation  of  the  presence  of  predicted  processed  chi¬ 
meric  mRNA  transcripts  in  MCF-7  using  RT-PCR. 


fusion  breakpoints  and  truncation  breakpoints.  The  PTPRG  break¬ 
points  occur  within  the  chromosome  3  breakpoint  cluster  and  co¬ 
incide  within  a  known  fragile  site.  Another  example  is  the  fusion  of 
the  BMP7  promoter  upstream  of  ZNF21 7  breast  cancer  oncogene 
overexpressed  in  breast  cancer  (Collins  et  al.  2001)  that  we  redis¬ 
covered  but  was  also  previously  described  Volik  et  al.  (2003,  2006). 
The  chromosome  20  rearrangement  hotspot  contains  37  break¬ 
points  surrounding  the  ZNF217  oncogene.  Another  extreme  ex¬ 
ample  of  multiple  rearrangements  is  the  breast  cancer  amplified 
sequence  3  (BCAS3),  occurring  within  the  chromosome  17  rear¬ 
rangement  hotspot.  There  are  seven  breakpoints  located  within  the 
intron-exon  boundaries  and  an  additional  19  nonfusion  break¬ 
points  surrounding  the  BCAS3  gene  region. 

Rearrangements  affect  genes  involved  in  homologous 
double-stranded  break  repair 

We  identified  rearrangements  in  genes  that  code  for  members  of 
protein  complexes  involved  in  double-stranded  break  repair 
(DSBR),  raising  the  possibility  that  defects  in  DSBR  genes  may  have 
contributed  to  genomic  instability  at  certain  stages  of  the  evolu¬ 
tion  of  the  MCF-7  genome.  One  of  the  four  MCF-7  gene  fusions 
that  produced  a  detectable  predicted  chimeric  transcript  is  an 
interchromosomal  fusion  of  RADS  1C  exons  1-7  to  the  neuronal- 
specific  gene  ATXN7  exons  6-13.  RAD 5 1C  is  a  paralog  of  RAD51, 
a  gene  central  to  DNA  DSBR.  RAD51C  is  an  essential  component  of 
a  complex  reported  to  be  involved  in  resolving  holiday  junctions 
(HJs)  formed  during  DSBR  (Liu  et  al.  2007)  and  as  such  is  integral  to 
the  maintenance  of  genomic  stability.  The  translocation  we  have 
identified  eliminates  the  domain  of  RAD51C  that  binds  other 
family  member  homologs  such  as  RAD51D  and  Xrcc3  (Miller  et  al. 
2004),  possibly  disrupting  formation  of  the  complex  responsible  for 
resolving  HJs. 

RAD51C  is  located  at  17q23,  a  region  of  amplification  that 
has  been  extensively  studied  in  MCF-7  cells  and  breast  cancer.  One 
of  the  most  studied  oncogenes  in  breast  cancer,  ErbB2/  is  in  close 
proximity  to  the  1 7q21.2  locus,  which  is  amplified  in  a  number  of 
breast  cancers  (but  not  in  MCF-7)  but  often  independently  of  the 
17q23  amplification.  We  examined  RAD 5 1C  expression  level  in 


the  microarray  expression  data  set  involving  50  breast  cancer  cell 
lines  reported  by  Neve  et  al.  (2006)  and  found  that  RAD 5 1C  levels 
are  elevated  in  MCF-7,  but  much  lower  or  absent  in  the  majority  of 
the  other  breast  cancer  cell  lines. 

We  identified  a  translocation  in  another  gene  involved  in 
DSBR,  BRCA1 -interacting  protein- 1  ( BRIP1 ,  also  termed  BACH1). 
BRIP1  was  originally  identified  as  a  helicase-like  protein  that 
interacts  directly  with  BRCA1  and  contributes  to  its  DNA  repair 
function.  BRIP1  binds  to  the  BCRT  repeat  in  BRCA1.  The  C  ter¬ 
minus  of  BRIP1  is  critical  for  its  interaction  with  BRCA1,  and 
a  truncation  mutant  has  been  shown  to  block  DSBR  (Cantor  et  al. 
2001;  Yu  et  al.  2003;  Lewis  et  al.  2005).  Importantly,  germline 
truncation  mutations  of  BRIP1  have  been  identified  in  familial 
breast  cancer  without  mutations  of  BRCA1/2,  and  BRIP1  trun¬ 
cations  confer  a  twofold  increased  risk  of  developing  breast  cancer. 
We  identified  a  translocation  that  results  in  the  loss  of  the  last 
three  exons  (exons  18-20);  however,  the  fused  DNA  (3 pi 4) 
downstream  of  BRIP1  does  not  contain  any  exons  or  introns.  The 
truncation  at  exon  17  of  BRIP1  would  eliminate  the  C-terminal 
third  of  BRIP1  and  eliminate  binding  to  BRCA1.  However,  it  is 
unclear  at  present  whether  the  truncated  mRNA  would  be  stable  as 
there  is  no  transcription  stop  site  or  polyA  tail. 

Discussion 

We  have  completed  a  sequence-level  survey  of  rearrangements  in 
a  cancer  genome.  One  major  insight  gained  from  this  analysis  is 
the  presence  of  two  types  of  breakpoints — clustered  and  dispersed, 
the  latter  being  associated  with  LCRs.  While  we  have  not  en¬ 
countered  previous  reports  of  genome-wide  association  of  LCRs 
with  DSB  breaks  and  chromosomal  instability  in  tumors,  the  role 
of  LCRs  in  promoting  double-strand  breaks  through  the  replica¬ 
tion  fork  stalling  mechanism  has  recently  been  proposed  in  the 
context  of  genomic  disorders  (Lee  et  al.  2007). 

A  second  major  insight  is  that  the  two  diverse  types  of 
breakpoints  may  have  arisen  during  different  stages  of  the  evolu¬ 
tion  of  the  MCF-7  genome.  Volik  et  al.  (2006)  hypothesized  that 
20q  telomere  loss  initiated  BFB  cycles  and  a  cascade  of  amplifica¬ 
tion  resulting  in  small  highly  rearranged  hotspots  that  colocalize 
DNA  from  different  genomic  regions.  Our  results  show  the  same 
chromosomal  rearrangement  architecture,  albeit  at  higher  reso¬ 
lution  and  are  consistent  with  the  hypothesis  that  BFB  cycles, 
possibly  including  extrachromosomal  amplisomes,  played  an 
initial  role  in  MCF-7  genome  evolution.  The  chromosome  3  rear¬ 
rangement  hotspot  encompasses  the  common  fragile  site  FRA3B, 
prone  to  chromosomal  instability,  and  a  mediator  of  recurrent  BFB 
amplification  found  in  a  variety  of  human  tumors  (Heilman  et  al. 
2002).  Recurrent  breaks  within  common  fragile  sites  propagated 
via  BFB  cycles  amplify  oncogenes  and  promote  tumorgenesis 
(Huebner  and  Croce  2001;  Heilman  et  al.  2002).  Since  both 
RAD51C-ATXN7  fusion  and  BRIP1  truncation  belong  to  clusters 
possibly  generated  by  the  BFB  mechanism,  a  possible  effect  is 
failure  of  the  HR  mechanism  of  DSBR  and  a  consequent  switch  to 
NHEJ  repair  at  stalled  replication  forks.  A  similar  previously  ob¬ 
served  precedent  is  the  switch  from  HR  to  NEHJ  in  RAD54  ho¬ 
molog  mutants  (Sonoda  et  al.  2006).  The  switch  to  NHEJ  at  some 
point  in  the  evolution  of  MCF-7  would  have  resulted  in  a  mutator 
phenotype  (Loeb  2001)  and  a  pattern  of  extensive  chromosomal 
rearrangements  observed  in  MCF-7. 

The  switch  to  the  rearrangement-creating  NHEJ  would  have 
exposed  the  most  breakage-prone  sites — those  containing  LCRs — by 
converting  simple  replication-associated  breaks  into  detectable 
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Figure  5.  ( A-Q  Cells  treated  with  SULF2  siRNA  have  an  enhanced  proliferation  compared  with  cells  treated  with  control  siRNA.  MCF-7B  (A;  Mao  et  al. 
2005),  MDA  MD231  ( B ),  and  MCF-1 0A  (Q  cells  were  transfected  with  50  nM  SULF2  or  control  siRNA;  1 04  cells  were  plated  in  medium  containing  1 0% 
FBS  48  h  after  transfection  of  the  siRNA.  Cells  were  counted  on  day  2,  4,  6,  and  8.  Experiments  performed  in  triplicate;  error  bars  show  standard  deviation. 
(D-D  Cells  treated  with  SULF2  siRNA  have  an  enhanced  survival  compared  with  cells  treated  with  control  siRNA.  MCF-7B  (D;  Mao  et  al.  2005),  MDA 
MD231  ( E ),  and  MCF-1 0A  (D  cells  were  transfected  with  50  nM  SULF2  or  control  siRNA;  104  cells  were  plated  in  serum-free  medium  48  h  after  trans¬ 
fection  of  the  siRNA.  Cells  were  counted  on  day  2,  4,  and  6.  Experiments  performed  in  triplicate.  Error  bars,  SD.  (G,b)  Treatment  of  MCF-7B  and  MDA 
MB231  cells  with  siRNA  for  SULF2  increases  the  anchorage-independent  growth  capabilities.  After  treatment  with  siRNA,  1 04  cells  were  plated  in  0.3% 
agar  in  growth  medium,  MCF-7B  colony  formation  is  shown  in  C.  Plates  were  incubated  for  21  d,  and  colonies  were  counted;  bar  chart  results  shown  in  H. 
Experiments  performed  in  triplicate.  Error  bars,  SD. 


rearrangements.  An  analogy  here  exists  between  LCRs  and  DSB 
repair  on  one  hand  and  microsatellites  and  mismatch  repair  on 
the  other  (Lengauer  et  al.  1998):  By  presenting  challenges  to  DNA 
replication,  LCRs  and  microsatellites,  expose  weaknesses  in  DSB 
repair  and  mismatch  repair  mechanisms,  respectively.  We  should 
note  that  our  extensive  sequencing  did  not  indicate  increased 
mutability  of  MCF-7  at  the  base  pair  level,  indicating  highly 
functional  mismatch  repair. 

The  two-stage  model  also  accounts  for  the  typical  curve  in¬ 
dicating  increase  in  genome  complexity  during  the  typical  evolu¬ 
tion  of  a  breast  cancer  genome  (Chin  et  al.  2004).  While  the  BFB 
may  account  for  the  steep  slope  of  rise  in  genomic  complexity  in 
MCF-7  during  the  stage  of  in  situ  carcinoma  and  telomere  crisis,  the 
subsequent  instability  mediated  by  the  failure  of  the  homology- 
based  DSB  repair  mechanism  resulting  in  breaks  at  LCR  loci  may 
account  for  the  subsequent  less  steep  slope  that  typically  follows 
completion  of  the  telomere  crisis  stage  and  accompanies  metastasis. 
The  two-stage  model  is  also  consistent  with  ongoing  plasticity  of 
the  MCF-7  genome  as  evidenced  by  polyclonality  and  divergence  of 
MCF-7  sublines  (Jones  et  al.  2000;  Nugoli  et  al.  2003). 

The  third  insight  is  abundance  of  genes  affected  by  rear¬ 
rangements,  and  particularly  of  gene  fusions,  which  exceeds  cur¬ 


rent  estimates  of  the  abundance  of  gene  fusions  in  breast  cancer 
(Mitelman  et  al.  2007).  Our  unbiased  screen  of  MCF-7  cell  lines 
identified  seventy  nine  genes  involved  in  rearrangement  events. 
Ten  gene  fusions  were  identified,  nine  novel  and  one  previously 
reported  by  Ruan  et  al.  (2007),  and  77  other  fusions  involving 
genes  and  gene  truncations. 

The  fourth  insight  is  that  at  least  a  fraction  of  genes  affected 
by  fusions  and  truncations  may  in  fact  be  tumor  suppressors  (e.g., 
PTPRG,  SULF2)  or  may  be  responsible  for  genome  stability  (e.g., 
RAD 5 1C,  BRIP1 ).  Both  BRIP1  and  RAD 5 1C  fall  within  the  cluster  of 
breakpoints  at  17q23  and  are  amplified  in  MCF-7  cells,  indicating 
possible  positive  selection  for  the  amplification.  Such  positive 
selection  would  be  consistent  with  previously  reported  dominant¬ 
negative  effects  observed  in  genes  responsible  for  genome  stability 
(Milne  and  Weaver  1993). 

The  fifth  insight  is  that  chimeric  transcripts  can  in  fact  be 
discovered  by  directly  mapping  rearrangements  at  the  level  of 
genomic  DNA  and  then  predicting  specific  chimeric  transcripts. 
This  opens  the  possibility  of  discovering  recurrent,  mechanisti¬ 
cally  and  prognostically  significant  rearrangements  by  simply 
mapping  a  sufficient  number  of  genomes  and  directly  observing 
recurrent  events. 
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In  conclusion,  this  study  validates  the  utility  of  mapping 
rearrangements  in  cancer  genomes  by  providing  mechanistically 
significant  insights  into  cancer  evolution  and  identifying  genes 
likely  involved  in  cancer  progression.  Building  on  the  benchmarks 
developed  in  this  study  next  steps  include  technological  and 
methodological  improvements  that  will  allow  scale-up  to  whole 
genomes  and  to  multiple  cell  lines  and  tumor  samples  at  a  more 
affordable  cost,  thus  broadening  applications  in  the  research 
context  and  eventually  in  clinical  settings. 

Methods 

Fosmid  library  preparation  and  end-sequencing  of  clone  inserts 

Fosmid  libraries  were  prepared  from  each  of  the  six  96-BAC  pools 
indicated  in  Figure  IB  using  the  Epicentre  EpiFOS  Fosmid  Library 
Production  Kit. 

DNA  sequencing 

The  ends  of  fosmid  inserts  were  obtained  using  Sanger-based  se¬ 
quencing  on  an  ABI  3730XL.  Approximately  300,000  short  (100- 
bp)  reads  were  obtained  from  each  of  the  three  192-BAC  pools 
indicated  in  Figure  IB  using  the  454  Life  Sciences  (Roche)  GS 
machine.  Detailed  sequencing  statistics  are  included  in  the  Sup¬ 
plemental  Table  1.  The  sequencing  reads  are  available  for  down¬ 
load  from  the  public  project  pages  at  http://www.genboree.org. 

Mapping  reads  onto  the  reference  genome 

Fosmid-end  reads,  454  Life  Sciences  (Roche)  shotgun  reads,  and 
BAC-end  reads  were  mapped  onto  the  reference  human  genome 
(March  2006  assembly,  Build  36)  using  the  BLAT  program.  BLAT 
parameters  used  for  mapping  are  described  in  Supplementary 
Materials  and  coordinates  are  available  through  the  Genboree  site 
on  the  Breast  Cancer  project  page  at  http://www.genboree.org. 

PCR  primer  design  pipeline 

PCR  primers  were  designed  for  amplifying  breakpoint  regions  us¬ 
ing  repeat-masked  human  genome  assembly  (March  2006  assem¬ 
bly,  Build  36)  using  a  semi-automated  primer  design  pipeline. 
Primer  3  primer  design  program  was  run  to  obtain  a  set  of  nested 
primers  using  two  categories  or  parameters,  "stringent"  and  "re¬ 
laxed."  Primer  pairs  in  each  category  were  scored,  and  the  highest- 
scored  primer  pair  was  selected  for  initial  round  of  PCR  amplifi¬ 
cation.  Priority  was  also  given  to  the  stringent  category.  In  case  of 
failure,  additional  lower-scoring  primer  pairs  were  employed. 
More  details,  including  Primer  3  parameters,  can  be  found  in 
Supplemental  materials. 

PCR  amplification  of  genomic  DNA  from  cell  lines 

Breakpoint  confirmation  included  PCR  amplification  of  a  pool  of 
genomic  DNA  from  six  different  sublines  of  MCF-7  cells  (B,  BK,  C, 
D,  L,  and  Neo).  DNA  isolated  from  immortalized  but  nontrans- 
formed  mammary  epithelial  cells  (MCF-10A)  and  normal  human 
female  DNA  (Novagen)  were  used  as  negative  controls.  Genomic 
cell  line  DNA  was  isolated  with  the  DNeasy  kit  (Qiagen).  PCR 
bands  were  visualized  on  a  2%  agarose  gel. 

Breakpoint  clustering  algorithm 

Consecutive  breakpoints  that  are  closer  than  1.1  Mbp  in  the  ref¬ 
erence  genome  assembly  were  connected.  Runs  of  consecutive 


connected  breakpoints  with  eight  or  more  members  are  declared 
to  constitute  a  cluster.  Four  clusters  on  chromosomes  1,  3,  17,  and 
20  indicated  in  Figure  3  were  obtained  in  this  fashion. 

Identification  of  LCR  regions 

Each  of  the  157  MCF-7  breakpoints  was  examined  for  the  presence 
of  LCR.  Intrachromosomal  and  interchromosomal  LCRs  were 
detected  by  applying  a  novel  algorithmic  method  to  the  human 
genome  assembly  (March  2006  assembly,  Build  36).  The  method 
involved  self-comparison  of  the  human  genome  using  the  Pash 
program  (Kalafus  et  al.  2004)  and  an  automated  pipeline  for  seg¬ 
mentation,  clustering,  and  parsing  of  LCRs  based  on  sequence 
feature  analysis.  The  LCRs  detected  by  this  method  cover  6.15%  of 
the  whole  genome  in  length,  of  which  18.7%  are  gene-containing 
regions.  A  detailed  description  of  the  algorithm  is  available  in 
Supplemental  materials. 

Analysis  of  recurrent  copy  number  changes  in  157  somatic 
breakpoint  loci 

Copy  number  variation  in  the  157  somatic  breakpoint  loci  iden¬ 
tified  in  this  study  was  examined.  In  order  to  identify  recurrent 
copy  number  changes  in  breakpoint  loci,  array  CGH  data  from  201 
breast  cancer  cell  lines  and  tumors  (Chin  et  al.  2006;  Neve  et  al. 
2006;  Shadeo  and  Lam  2006;  Jonsson  et  al.  2007)  were  integrated. 
A  locus  was  declared  recurrently  amplified  if  amplification  was 
reported  in  more  than  20%  cases  for  the  specific  locus.  Detailed 
results  are  compiled  in  a  table  where  breakpoints  are  sorted  by 
their  level  of  recurrent  copy  number  amplification  (for  details,  see 
Supplemental  materials  and  Supplemental  Table  3). 

Analysis  of  recurrent  expression  and  copy  number  changes 
in  79  breakpoint-associated  genes 

Patterns  of  recurrent  copy  number  and  expression  level  variation 
were  examined  for  79  genes  associated  with  the  157  somatic 
breakpoints  identified  in  this  study.  Expression  data  from  50 
breast  cancer  cell  lines  (Neve  et  al.  2006)  were  combined  with  copy 
number  data  from  201  breast  cancer  cell  lines  and  tumors  (Chin 
et  al.  2006;  Neve  et  al.  2006;  Shadeo  and  Lam  2006;  Jonsson  et  al. 
2007).  Detailed  results  are  compiled  in  a  table  where  genes  are 
sorted  by  their  level  of  recurrent  alteration,  (for  details,  see  Sup¬ 
plemental  Materials  and  Supplemental  Table  2).  Additionally, 
copy  number  data  from  an  Affymetrix  100k  SNP  chip  were  used  to 
identify  breakpoint  genes  that  also  associate  with  regions  of  copy 
number  alteration  (see  Supplemental  Table  3). 

Detection  of  predicted  fusion  transcripts  by  RT-PCR 

mRNA  from  exponentially  growing  MCF-7  and  MCF-10A  cells 
were  isolated  with  the  RNeasy  kit  (Qiagen).  To  determine  the 
presence  of  a  fusion  transcript,  primers  were  designed  across  the 
fusion  point  on  cDNA  using  Primer3.  Control  primers  were 
designed  on  either  side  of  the  fusion.  cDNA  was  generated  by  us¬ 
ing  gene  specific  primers.  PCR  amplification  of  the  mRNA  was 
restricted  to  35  cycles.  PCR  bands  were  visualized  on  a  2%  agarose 
gel,  and  verified  by  sequencing  to  confirm  that  the  product  con¬ 
tained  mRNA  from  both  genes  involved. 

Cell  growth  and  soft-agar  experiments 

For  the  cell  growth  experiments,  10,000  cells  were  plated  in  trip¬ 
licate  in  24-well  plates.  The  cells  were  grown  in  growth  medium, 
containing  10%  FBS,  or  in  serum-free  medium.  Growth  rate  was 
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measured  on  days  0,  2,  4,  and  6  with  a  Coulter  Counter  (Beckman 
Coulter). 

Colony  growth  assays  were  performed  as  followed:  1  mL  of 
solution  of  0.5%  noble  agar  in  growth  or  serum-free  medium  was 
layered  onto  30  x  10-mm  tissue  culture  plates.  A  total  of  1  X  104 
cells  was  mixed  with  1  mL  of  0.3%  agar  solution  prepared  in 
a  similar  manner  and  layered  on  top  of  the  0.5%  agar  layer.  Plates 
were  incubated  at  37°C  in  5%  C02  for  21  d.  The  experiment  was 
performed  in  triplicate. 

Knock-down  of  SULF2  using  short  interfering  RNA  (siRNA) 

Transfections  with  SULF2  and  control  nonspecific  siRNA 
(Dharmacon)  were  carried  out  using  50  nM  pooled  siRNA  duplexes 
and  4  jjlL  of  Dharmafect  (Dharmacon)  in  six-well  plates  according 
to  the  manufacturer's  protocol.  After  48  h,  the  cells  were  prepared 
the  respective  assays. 
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